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Abstract 

We develop asymptotically optimal policies for the multi armed bandit (MAB), problem, under 
a cost constraint. This model is applicable in situations where each sample (or activation) 
from a population (bandit) incurs a known bandit dependent cost. Successive samples from 
each population are iid random variables with unknown distribution. The objective is to design 
a feasible policy for deciding from which population to sample from, so as to maximize the 
expected sum of outcomes of n total samples or equivalently to minimize the regret due to 
lack on information on sample distributions, For this problem we consider the class of feasible 
uniformly fast (f-UF) convergent policies, that satisfy the cost constraint sample-path wise. We 
first establish a necessary asymptotic lower bound for the rate of increase of the regret function 
of f-UF policies. Then we construct a class of f-UF policies and provide conditions under which 
they are asymptotically optimal within the class of f-UF policies, achieving this asymptotic 
lower bound. At the end we provide the explicit form of such policies for the case in which the 
unknown distributions are Normal with unknown means and known variances. 

Keywords: Inflated Sample Means, Upper Confidence Bound, Multi-armed Bandits, Sequential 
Allocation 

1. Introduction 

Consider the problem of sequential sampling from a finite number of independent statistical 
populations, where successive samples from a population are iid random variables with unknown 
distribution. 

Consider the problem of sequential sampling from k independent statistical populations, 11®, 
i = 1,..., fc. Successive samples from population i constitute a sequence of i.i.d. random variables 

A|,... following a univariate distribution with density fi{ |0j) with respect to a nondegenerate 
measure v. The density fi{\) is known and 9_^ is a parameter belonging to some set 0^. Let 6 = 
(^ 1 ; • ■ ■ denote the set of parameters, 0 € 0, where 0 = 0i x ... x 0^. Given 9 let fi{9) = 
{ri{9i), ..., Rkidk)) be the vector of expected values, i.e. Ri{0i) = Eg{X^). The true value 9^ of 9 is 
unknown. We make the assumption that outcomes from different populations are independent. 
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Sampling from population 11^ incurs a positive cost c* per sample, and without loss of generality 
we assume <(?<...< , and not all d are equal. The objective is to maximize the expected 

average reward per period subject to the constraint that the long-run average sampling cost per 
period does not exceed a given upper bound c° for each period. Without loss of generality we 
assume < c° < c^. In case where c° < c^, the problem is infeasible, while in the other case 
where c° > the cost constraint is redundant. Let d = max{j : d < c°}. Then \ < d < k 
and < cP < d^^^. We consider adaptive policies which depend only in the past observations of 
selections and outcomes. Specihcally, let At,Xt , t = 1,2,... denote the population selected and 
the observed outcome at period t. Let Ht = (^i, Xi,...., At_i, W-i) denote the history of actions 
and observations available at period t. An adaptive policy is a sequence tt = ( 711 , 712 ,...) of history 
dependent probability distributions on {!,..., fc}, such that 7r„(j, h„) = P(A„ = j\hn) for a given 
realization of iL„. Given h„, let T!^(n) denote the number of times population a has been sampled 
during the first n periods T^{n) = = ct}. Let V^(n) and C,r(«) be respectively the total 

reward earned and total cost incurred up to period n, i.e., 

fc Tiin) 

= E E (1) 

2=1 


k T;(n) 

CAn) = E E 

i=l t=l 

We call an adaptive policy feasible if 

C^(n)/n<c°, Vn=l,2,... 


( 2 ) 


(3) 


The objective is to obtain a feasible policy tt that maximizes in some sense EgVTr(n), V0 G 0. 
In the next section we will show that this is equivalent to minimizing a regret function (0, n) 
that represents the expected loss due to lack of information on the sample distributions. For this, 
we consider the class of feasible policies that are uniformly fast (UF) convergent, in the sense of 
Burnetas and KatehakisI ( 1996ll : we call these polices (f-UF) policies. We first establish in Theorem 
1, a necessary asymptotic lower bound for the rate of increase of the regret function of f-UF policies. 
Then we construct a class of “block f-UF” policies and provide conditions under which they are 
asymptotically optimal within the class of f-UF policies, achieving this asymptotic lower bound, cf. 
Theorem 2. At the end we provide the explicit form of an asymptotically optimal f-UF policy, for 
the case in which the unknown distributions are Normal with unknown means and known variances. 
These policies form the basis for deriving l ogarithmic regret p o lices for more general models , cf. 


Auer et al. ( 2002 b Auer and Ortnei ( 20101) . Cowan et al. ( 2015 ). Cowan and KatehakisI ( 2015ah . 


The extensive literature on the multi-armed bandit ("MAB) problem, includes the following: 

Lai anc 

Robbinsl (Il98,5h. Kate 

lakis and Robbins 

(ll99,5^. Kleinberd (l2004l). Mahaian and TeneketzisI 

(200S[l. 

Audibert et al.l (20091). 

Auer and Ortneil ( 

2 OIOI). Honda and Takemura (I 2 OIII). Bubeck and Slivkins 

(201211. 

Cowan and KatehakisI ( 2015lJ) and references therein. As far as we know, the first formulation 


(1998). Tran-Thanh et al. ( 2010[) . considered the prob lem when the cost of activation of each arm is 


fixed and becomes known after the arm is used once. Burnetas and KanavetasI ( 20121 ) considered a 


version of this problem and constructed a consistent policy (i.e., with regret i?,r(n) = o{n)). In the 
present paper we employ a stricter version of the average cost constraint that requires the average 


sampl ing cost not to exceed c° at any time period and not only in the limit. iBadanidivuru et^ 


( 201,01 ). considered the problem where there can be more than one side constraints (“knapsack”) 
and showed how to construct polices with sub-linear regret. They also dis c uss in t eresting appli¬ 


cation s of the model, such as to: problems of dy n amic pricing IWang et al.l (120141) . I.lohnson et a,l 


( 20151) . dynamic procurement Singla and Krau^ ( 2013 ). and auctions Tran-Thanh et al. ( 2014 ). 
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Ding et alJ (|2013^ constructed UF policies (i.e., with regret RTz{n) = o(logn)) for cases in which 
ac tivation costs are bandit dependent iid random variab l es. For other recent rela t ed work we refer 
G_ulmand_^rumg^ ( 2007 1. Tran-Thanh et al. ( 2012 1. Thomaidou et all ( 2012ll . Lattimore et al. 


to: 


For other work in this area we refer to Katehakis and Derman 

(19861. Katehakis and Veinott Ji! 

(119871. Burnetas and Katehakis (1993h. Burnetas and Katehakisl( 

1996all. Lagoudakis and Pari (20031. 

Bartlett and Tewaril(2009ll.lTekin and Liul(2012ll. Jouini et al.l(2009^. Davanik et al. 

(20131. iFilinni et al. 

(|20injl. Osband and Van Rovl (l2014h. As well as [ 

lurnetas and Katehakis! (l200.‘lh. 

Audibert et al. 

(1200911. Auer and Ortneil(20101.lGittins et al. (2011 

.iBubeck and SlivkinsI (l2012il. Canoe et al. (20Hih. 

KaufmannI (2015^. Li et al.1 (2014h. Cowan and Katehakis! (2015bll. Cowan and Katehakis (l2015cl. 

and references therein. For dynamic programming extensions we refer to Burnetas and Katehakis! 

(19971. Butenko et al. (2003h. Tewari and Bartlett 

(20081. Audibert et al. (20091. LittmanI (20121. 


Feinberg et al.l (|201411 and references therein. 


2. Model description - Preliminaries 

The complete information problem where 9 is known, the expected average reward is to be 
maximized, and the expected average cost does not exceed c°, can be solved via the following linear 
program (LP-1) which is instrumental in the development of the lower bounds and the asymptotically 
optimal policy. 

k 

(LP-1): z*{^ = max {9^)xj 

1=1 

k 

c>Xj + y = c° (4) 

1=1 

k 

1 

1=1 

Xj > 0,Vj y > 0. 


The solution is a randomized sampling policy which at each period selects population j with probabil¬ 
ity Xj, for j = 1 ,..., fc, where the randomization p r obabi li ties are an optimal solution to the above 
linear program (LP), cf. Burnetas and Kanavetas ( 2012 1: Burnetas and Katehakid ( 1998[l . However, 
such policy may not be feasible in our framework that requires C 7 r('u)/n <c°, Vn=l,2,..., be¬ 
cause simple randomization may lead to sampling in such a way that CT^(n)/n exceeds , for some 
periods. However, in the complete information setting, under the assumption that the coefficients 
are all rational, any optimal solution of LP-1 which is an extreme point is also rational, thus an 
optimal randomized policy can be implemented as a periodic sampling policy within blocks of time 
periods within which the order of sampling can be set so that the sampling cost constraint is never 
violated, and the sampling frequencies remain equal to Xj. We use generalizations of this idea in the 
incomplete information framework in the sequel. 

We next introduce necessary notation regarding the LP-1. First, its dual problem (DLP-1) is 


(DLP-1): = min g + c^\ 

g + c^\> yi(0i) 


g + c^X > Pk{9_f.) 
g G R-, A ^ 0. 
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A basic matrix B is of the form 


for some i < d. 


d d> \ 

^ ^ 1, for some i < d < j oi 

They correspond to sampling from the pair {i,j) or population i, respectively. We denote the Basic 
Feasible Solution (BFS) corresponding to matrix B as b = {i,j} or b = {*}, respectively. Note that 
in the case of degenerate BFS b, more than one matrices B correspond to the same b. 

We use F to denote the set of BFS: 



F = {b : b = {i,j}, i < d < j or 6 = {i}, i < d}. 

Since the feasible region of Eq. dU is bounded, F is a finite set. 

For a basic matrix B, let ,g^) denote the dual vector corresponding to B, i.e., = 

where hb{§) = or depending on the form of B. 

Regarding optimality, a BFS is optimal if and only if for at least one corresponding basic matrix 
B the reduced costs (dual slacks) are all nonnegative: 

(£) = c“A^ +g^- > 0, a = 1,..., fc. 

A basic matrix B satisfying this condition is optimal. It is easy to show that the reduced cost 
can be expressed as a linear combination of the unknown population means, i.e., (paiS.) = 
where is an appropriately defined vector that does not depend on n{9). In the sequel we use the 
notation s{9) to denote the set with optimal solutions of LP-1 for a vector ^(0), i.e., s{9) = {b £ F : 
b corresponds to an optimal BFS}. 

We define the loss or regret function of policy tt as the finite horizon loss in expected reward 
with respect to the optimal policy under complete information: 


ri) = nz*{0 — EgVTr{n) 

k 

= nz*{g)-Y,Mij)Etriir^) ( 5 ) 

i=i 

We next derive an equivalent expression that relates the regret to the solution of the complete 
information LP. Recall that for any basic matrix B which corresponds to an optimal solution of LP- 
1, from the DLP-1 program we have that Vj: z*{9) = c°A^ + g^ and Hj{9j) = + g^ — 4’fiS.)- 

These relations and Eq. (I5|) imply: 

k k 

R. (£, ^) = E E’ (6) 

i=i i=i 


for any 9 £ Q. 

We now state: 

Definition 1. a) A feasible policy tt is called consistent if 

Rnid^n) = o(n), n —)• oo, V 0 £ 0. 
b) A feasible policy tt is called f-uniformly fast (f-UE) if 


RTr{9,n) = o(n“), n —>• oo, V a > 0, V 0 G 0. 

In the s equel we will show that there exist f-UF policies, following the approach of iBurnetas and Katehaki^ 
( 1996bll . by construction of a function M{9) and a f-UF policy 7r° such that 


liminf ri)/logn < M(0) V0 £ 0. 
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As we will be shown later, (Theorem 1), policy has the much stronger property of asymptotic opti¬ 
mality. Indeed, M (0) is also a uniform lower bound on the limit of (0, n) / log n, of any f-UF policy. 

3. Lower Bound for the Regret 

Recall that for 6 S F, 6 is an optimal solution of linear program LP-1 for some 0 G 0 if and only 
if for at least one corresponding basic matrix B, cj)^{6) > 0, a = 1,..., k. 

For any b G s(0), where b = {i,j} or {i} and a ^ i,j, we define the sets A0a(0) and F(0), as 
follows. The first set includes all perturbed values 0^, of ^ of population a, such that the complete 
info problem under 0 where only 9^ is perturbed to 9^ has a unique optimal BFS which includes 
population a. The second set D{9)^ contains all populations which are not contained in any optimal 
solution under parameter set 0 but, by varying only parameter 0^, a uniquely optimal BFS that 
contains them can be found. Formally, 

A0a(£) = {0a e 0a : s(^ ) = {{i,a} or {a,j} or {a}}}, 

where 0 = {9^, ■■■, 99, is a. new vector such that only parameter 0^ is changed from 0^. 
Then, D{9) is the set of populations, which are not optimal under 0, but become part of a uniquely 
optimal BFS after a parameter change of 9^ only. 

D{9) = {a : a ^ & for any b G s(0) and A0a(0) ^ 0}, 

Let I{9^, 9^) denote the Kullback-Leibler information number, defined as 

/ f(x-9 f 

log ': 

7-00 f[x;9^) 

Now we can define the minimum deviation, in the sense of the Kullback-Leibler information 
number, of parameter 0^ from 0^ in order to achieve that population a becomes optimal under 9^. 

Ka(£) = inf{/(0a,0L) : £ G A0a(£)}. 


We have: 

Lemma 1 For any 0, and any optimal matrix B under 9, 3 p = p{9, a,B) > 0 such that for any 

G A0a(£) : 

(i) 4'fii) = > 0, V jV a and (£) = (£) -f < 0. 

(ii) Ma(£) < < /ia(£) + where /i* (£) = (£) + 

The above Lemma implies the following form for which is necessary for the proof of 

Lemmas and Theorems of the paper: 

Ka(£) = inf{/(0a, 0J : C G Oc, + P}. 


where p = p{9, a, B) > 0. 

Lemma 2 and Proposition 1 below are used to establish the following Lemma 3 from which 
Theorem 1 for the regret function follows. 

First note that in Eq. (I6|) both terms are nonnegative, the first because of optimality and the 
second because of feasibility. Therefore, a necessary and sufficient condition for a policy tt to be f-UF 
is that for 0 G 0 and any optimal BFS b under 0 and for all B corresponding to b: 



0, for all a > 0, j ^b, 


(7) 
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and 


n—^oo Tl^ 


( 8 ) 


We can now state: 

Lemma 2 Assume a uniquely optimal BFS and B € s(0). Then 
/ c® \ 

(i) if B = I ^ ^ 1, for some i < d < j then A > 0, 

(ii) if B ^ Q ^ , for some i < d then A^ = 0. 

Proposition 1 For any f-UF policy tt and for all 0 S 0 we have that for a € D{0_), any 0 G A(0) 
and for all positive sequences: (dn = o(n) it is true that 

Pg' [T“(n) < Pn] = o{n°‘~^), for all a > 0. 


So far we have shown that a necessary condition for a uniformly fast policy is that V 0 G 0, and 
y a € D{6) it must be true that the number of samples from populations jo and a are at least /3„ 
correspondingly, because Pg'{T^°{n) < /3„) = o(n““^), Pg'{T^{n) < /3„) = for any positive 

sequence of constants /3„ = o(n). 

Lemma 3 If Pg' [T“(n) < /?„] = o(n““^), for all a > 0 and a positive sequence /?„ = o(n) then 


hm P,[T“(n)<^]=0, 
n^oo - Pa\M.) 

for all 0 G 0 and a G A(0). 


We next define the function M(0) and prove the main theorem of this section. Let 


M{g)= Y. 


jeD{e) 


KM' 


Theorem 1 If tt is an f-UF policy then 


Rir{d,n) 

liminf ~ - > M{0), V0 G 0. 

n-^oo log n — — 


Proof Recall, 


RAg.ri) = Y + A^[nc° - EeC^{n)l 

i=i 

and by Lemma 3, using the Markov inequality, we obtain that if tt is f-UF, then 

EeT^in) i 

liminf ^Vj G D{e), V0 G 0. 
n^oo log n Kj [0) - - 

Also, we have from Lemma 2 that A^ > 0 and from Eq. we have that nc® — EgCT^^n) > 0, 
for all n. Finally, we have that the optimal populations under 0 have (0) = 0, thus 


liminl —- > > 


logn Rjid) 

6 jer’(e) ^ 


^ = , for all 0 G 0. 
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4. Blocks and Block Based Policies 

We consider a class of policies such that sampling is performed in groups of subsequent periods 
called sampling blocks, of finite length, where the total cost of actions in each block satisfies the cost 
constraint of Eq. ([3]) as follows. Define the differences 

= e - c°. 

(5® expresses the net cost effect of a single observation from a population i on the sampling budget. 
This effect is a net cost if <5* > 0 or net savings if 5® <0. 

The original problem is equivalent to the transformed problem where c® = (5®, * = 1,..., fe, = 0 
and the sampling constraint is 


— (5"^* <0, V n. 

n ^ 

t=\ 

Since ^® are assumed to be rational, for each i = 1,..., fc and there is a finite number of them 
we may assume, without loss of generality, that they are all integers. 

Let J C {!,..., fc} be the subset of populations sampled within a sampling block. The “cheap” 
populations in J must be sampled often enough to finance sampling of the “expensive” ones. Math¬ 
ematically it suffices to find {ruj, j € J} such that each population j € J is sampled trij times, 
and ^ 0; G N, V j € J. Any block with rrij satisfying the previous properties is 

called admissible. One possibility is to consider the smallest block, which will be appropriate in the 
incomplete information case. Thus the minimum length of the sampling block, i{J), is the solution 
of the following integer linear program 

i{J) = min{y^ rrij : rrijS^ < 0 & G N, V j G J}. 

jGJ j&J 

An optimal solution of LP-1 specifies randomization probabilities that guarantee maximization 
of the average reward subject to the cost constraint. The populations into this optimal solution 
define the set J, and J, 5® and i are observable constants. 

We use the Initial Sampling Block (ISB) and Linear Programming Block (LPB) blocks below to 
define a class of policies tt that are feasible, as follows. 

a) A policy n starts with an ISB block during which all populations {1,..., fc} are sampled at 

least a predetermined number of times no, with a sufficient number of samples taken from cheap 
(small c®) populations, so that the constraint of Eq. (I3|) is satisfied sample path-wise. This block is 
necessary in order to obtain initial estimates of for all populations. This means that the ISB 

block has the minimum length of £{J), defined above, with J = {1,..., k}. 

b) After a completion of an ISB block a tt policy chooses any BFS (or equivalently a single 
population {i} or a pair of {i,j}) and continues sampling for a block of time periods LPB=LPB(b) 
as follows. 

i) When b = {f}, (which means that c® < c^) tt samples from population i only once. In this case 

we define the LPB block to have length equal to: = 1, and its sampling frequency Xi to be equal 

to 1, Xi = 1. 

ii) When b = {i,j}, tt samples a number of times each population in {f,j} in b so as the cost 

feasibility of n is maintained during the block. The latter is accomplished by taking the length of 
the LPB block to be equal to: -I- = |(5^j -I- |(5®|, where and m’j = |5®|, and sampling 

the least cost population hrst in such a way that the frequencies are equal to the randomization 
probabilities: 

|(5^j |5®| 

^® + |<5®|+ 1(5^1’ 
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Remark 1 Note that in the second case of an LPB, the randomization probabilities for {i,j}, and 
the block length are computed without solving LP-1, using the known, cf. Eq. ([H]), (5’s. 

Note that a block based policy is a well defined adaptive policy. In the sequel we restrict our 
attention to block based policies; for notational simplicity we will simply write tt in place of tt, when 
there is no risk for confusion. 

Assume that we have I successive blocks we take T^{1) to be the number of LPB(6) type blocks 
in first I > 2 blocks (since for / = 1 we start with an ISB block). Thus ^^(0 = ^ — 1- Let St^^I) 

be the total length of first I blocks and let = Lji{n) denote the number of blocks in n periods. 
We can easily show that 

b:aGb 

where is the number of samples from population a between a LPB(6) and is the number of 
samples from population a in the ISB block. Now we can define the regret of blocks 

k 

RAg, l) = z*{g) EgS^il) - EgY^ Y fl{l) 

j-l h^K 

k 

i=i 


We note that 

T“(5,(L„)) < T“(n) < T^{S^{L^)) + (9) 

where is the maximum number of times where population a appears in every block. Thus we 
obtain the following relation for the two types of regret, 


RAg. Ln) + {n- EgSALn)) Z*(g) - Y 

i=i 

<RAg.n)<RAg.Ln) + {n-EgSALn))z*[§. ( 10 ) 

The above and Eq. (11011 imply the following relation between the two regret functions, 

Rnii^n) R^{0,Ln) 

limsup ——= - = limsup — ~ -. (11) 

n—^oc log Tl n—>cxD log Ln 

From Eq. dm, it follows that in order to find a policy that achieves the lower bound for Rtt{ 0_, n), it 
suffices to find a policy that achieves the lower bound for i? 7 r(£, L„). 


5. Asymptotically Optimal Policies 

In this section we provide a general method to construct asymptotically optimal policies 7r° that 
achieve the lower bound for the regret. To state the policy we need some definitions. We define at 
any block I and for every population a as Jla 

Pa = SUpIpAC) ■ ALL) < 

and as “ 

= {a : AAL < Ma < AAL + PiLa.B)}. 
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We recall that if we have an optimal BFS b, where b = {i,j} or {*} then the optimal solution is 
= fiiXi + HjXj or z’’ = fii- 

INFLATED Z-POLICY 7t°: 

Start with one ISB block in order to have at least one estimate from each population. Then, 
Step 1 Assume that at the beginning of block Z, / > 1, we have the estimates 9 , from the previous 
I — 1 blocks with We take the solution of LP-1: 

> t{1 - 1)} 

biig') 


where bi are all the BFS in F and r is any fixed constant in: (0,1/1^1). 

(^B 8^ ) 

step 2 Then for every a = we compute the ^^’s and - 

(B 0^') ''I ''I 

Then, if - = 0, we take 7r°(0 ) = b{9 )), otherwise for every a G 




we define the index: 


Uaii ,0„) = max{z^“^i : I{kaAa) < 
“ s' 


\ogS^{l-l) 


and we take ^ 

7r°(|) = argmax {ua{i ,9^), a € 

Remark 2 a) In Step 1 of our policy we have to compute the values of the objective function for finite 
number of basic feasible solutions. These computations are not complicated because the LP solution 
only needs the mean values of the populations at this block and the randomization frequencies which 
are as we know constants and depend only on which populations we have in the BFS. We recall that 
if we have a BFS b, where b = {i,j} or {i} then the optimal solution is z^ = ^iXi + ^J-jXj or z^ = Hi- 
Thus, in order to compute the value of the objective function it is not required to solve the LPs but 
only to compute and compare the corresponding z^, using these explicit formulas. 

The main result of this paper is that under the following conditions policy is asymptotically 
optimal. 

To state condition Cl we need the definition of the index Jai9,e), of population a. For any 
0 G 0, e > 0, an optimal matrix B under 0, and a p{9,a,B), as in Lemma 1, we define: ©^(e) = 
{ia ■ Ma(£) - e < < pUg) + p{g,B) - c} and 


U0,e) = ^ inf {/(0„,0J : z{U > ^*(0 - e}- 


-i {BS‘) 

From the definition of index Ja[9 , e), where a G “ , 


Mg,e) = inf{/(£l,0J : > z*{g) - e}. 


we have that Ua{9 ,9^) > z*(0 — e if and only if Ja(9 , e) < logS'^(Z — 1)/T^(St^(1 — 1)). 

(Cl) V ^ G 0, i ^ s(£) such that A0i(^) = 0, if p*{^ —e< Pi{9^ < M*(£) +p(£, i,B) — e, V e > 0, 
for some G 0i, the following relation holds: 

lim Ji(9, e) = oo. 

p—i-n — 
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(C2) Vi, V 0, e 0*, V e > 0, 


Ps.{\ti — §.i\ > e) = o{l/t), as i —>• oo. 

(C3) V 6a € s(0), Vi, V 0j e 0i, V e > 0, as i —>■ cx) 

Pe{z^°‘^= < z*(0) — e, for some j <t) = o{l/t). 

Next, we state and prove the main theorem of the paper. 

Theorem 2. Under conditions (C1),(C2), and (C3), and policy 7r°, defined above, the following 
holds. 

Rti° iPj 

limsup —^—=- < M{6), for all 0 e 0. 

n—fcso log 71 ~ ~ 

Proof 

To establish the above inequality it is sufficient to show that for policy 7r° the inequalities below 
hold. 

EoTL{n) 1 

limsup ~ — < yy-TT^, Vj S D{6), (12) 


lim sup 

n->-oo log n 


log n Kj (0) 

EBTio{n) 


= 0, Vj i U(0), 


nc° — E^C^o{n) = o(logn). 


(13) 

(14) 


The proof of these inequalities is give n in the appendix. _ 

Remark 3 According to Remark 46 in iBurnetas and Katehaki^ (1996b) condition (C2) is equivalent 
to C2’ below which is easier to verify. 

(C2’) V 6 > 0, as t —i- oo 


t-i 

€ s{i),MYe) < Mg,e) - S) = o(logt). 
i=i 

6. Normal Distributions with known variances 

Assume the observations from population a are normally distributed with unknown means 
EX^ = 0Q, and known variances i.e., 0^ = 9a, lJ,a{9_a) = da, and 0 q, = (—oo, +oo). Given history 
hi, define 

r“(5,o(z-i)) ■ 

Now from the definition of 0 q,, it follows that A0a(0) = {9a + 4'a(.d),9a + 4>a{d) + p{d,a,B)) 
for any optimal matrix B under 0, therefore D{9_) = {1,..., k}, V 0 € 0. Thus, we can see from the 
structure of the sets 0^ and A0a(0) that condition (Cl) is satisfied. 

Also, we have: 

■ (O'a-0.r 


m,da) = 

KaiS.) = 


2al 

i.€m? 

2cr2 


Therefore our indices are equal to 


u«(0,0^) = A(^‘-^^ 
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where 


1/2 


'iKc 


2\ogS^o{l - 1) 


For example, if 6^(0 ,0^°) = {a,j}, then = 0^°Xa + OjXj and z*(0) = OaXa + 0jXj. 

Therefore for 6^(0 ,0^°’) G s{9) and from the structure of the index is a sum of normal 

distributions which is also normal or a normal distribution and from the tail of normal distribution 
condition (C3) is satisfied. 

According to Remark 3 the next sum of probabilities is equivalent to condition (C2) 


f] Pe, m') G s(0), M0\ e) < JM. e) - <5) 

t^2 

Ln 

= e > 0, 


t=2 

where the equality follows after some algebra because of the normal distribution and the explicit 
form ofin this case: 

J,(0‘,e) = inf{/(0*,0') : > z*{9) - e} < 

e. 

J,(0,e) = ini{m,9\) : > z*{g) - e} - 5. 

f>'i 

Also, we have that 0* is the average of iid random normal variables with mean thus 
P9°{\^\ - 0i\ >0 < PeiHSi - 0i\ > for some I < t) 

t 

1=1 

where the last equality follows from is a consequence of the tail inequality 1 — $(a;) < ^{x)/x for 
the standard normal distribution. Thus, we can see that condition (C2) holds. 

Summary of Policy At the beginning we take an ISB block. Then at the beginning of block I we 
take 

b(fl') r bi{0‘) 


= max{z" 
bid') 




and find our indices 

where 




05 “ = 0l 


2iogg^o(i-i) y/^ 


(15) 


Finally, we choose to employ as block I the argmax^{ua(0 ,05“)}- 


Remark 4 In the case in which aa are unknown, we expect that a (log - rate regret) f-UF policy 
can be obtained by replacing in Eq. HSl) by a constant times <7^, as in lAuer et alT (|20^. This 
work is currently in progress. 
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Appendix: Proofs 

Lemma 1 For any 0, and any optimal matrix B under 0_,3p = p{9_, a,B)>0 such that for any 
£ G A0„(£) : 

(i) > 0, V jV a and = </)f (£) + Ma(^c) “ < 0, 

(ii) + P, where p^g) = (/)f (£) + Pa{0^). 

Proof (i) It is obvious that ^ Oj V a because we only change the parameter 

of population a and ipfiS.) = ^fi8.) = — Pj{§.j)- 

For a population a G -8(0) we have that a for any b G s{9). Therefore = c“A® + 9^ — 

Pai9_^) > 0, for any B corresponding to b. 

Now, any optimal b G s{9) is not optimal under 9 = (0;^,... ,0^,..., 0^), for any 0^ G A0q,( 0), 
thus s{9 ) = {6 } where b ^ s(0). 

Therefore, for any optimal matrix B under 0 we have that = c“A^ + 9^ ~ Pa{^) < 0 

because B is not optimal under 0 . 

Now from 4>^ (£) = c“A'® + g^ - Pa{9^) we have that 4>a{§^) = '(’a (£) + Ma(0a) - /ra(0a) < 0- 
(m) Consider first the case where b = {i,j} is an optimal solution under 0 with corresponding 

optimal matrix B = B{9), and b' = {z, a} is an optimal solution under 0 with corresponding optimal 
matrix B' = B{9'). From (z) we have that z*{9) > z*(9) iff Pa{9_a) > Pai^)- 

Since b' is uniquely optimal under 0 we have that (pf (0 ) >0, for any s ^ i, a. Now in order 
for that condition to hold we use that (p^{9) > 0 for any s ^ z,j and we have that for s > z it 
suffices that ^^(0) < Pa(9a), while for s < z we must have p^iS) < Pa[9_^) < Pa{9) + p, where p is 
a positive constant. Thus, if p*a{^ < Pa{9^) < Pa(0) + P then (/)f (0 ) > 0 for any s. 

The other cases where the population a is a population with cost lower than and the optimal 
solution under 9 has this form b' = {a,j} or b' = {a} follows with the same arguments as in the 
previous paragraph. 

□ 

Lemma 2 Assume b is uniquely optimal BFS and B any optimal matrix under 0. Then 
/ c* ct \ 

(i) if B = I ^ ^ 1, for some i < d < j then A^ > 0, 

(ii) if B = ^ Q ^, for some i < d then A^ = 0. 

Proof (i) Let 0 : s(0) = {6}, b = {i,j) for z < d < j, then A® > 0 because if A^ = 0 we must 
have more than one solutions in the primal, which cannot occur because b is uniquely optimal. 

(ii) Let 0 : s{9) = {6}, b = (z) for i < d, then A^ = 0 from the dual solution and (pf{9) > 0 for all 

j + i- 

□ 

We recall for the next Proposition 

k 

z* (0) = max ^ pj (0j )Xj 

i=i 

k 

cdxj + y = c° 
i=i 

k 

=1 

i=i 

Xj > 0,Vj, y > 0, 
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and that a necessary and sufficient condition for a uniformly fast policy tt is that for 0 € 0 and any 
optimal BFS b under 9, 


„ EgTlin) 

(j), (0) lim —=-= 0, for all a > 0, j ^b, 


(16) 


n^oo fi' 

and also, 

R - c^)EeTUn) 

A® lim -=-= 0, for all B corresponding to b. (17) 

n—^oo Ji^ 

Proposition 1 For any uniformly fast policy tt and for all 0 S 0 we have that for a e 71(0), 
any 0 G A(0) and for all positive /3„ = o{n) it is true that 


Pg! [T“(n) < Pn] = o{n°‘ ), for all a > 0. 


Proof Let a G D{6), 9^ G AQa{9). Because of the definition of AQa{9) we must have a b which 
is uniquely optimal under 0 (s(0 ) = {7* }) and a € b . Then we have two cases for the uniquely 
optimal solution b . 

For the first case where b = {a} if b is nondegenerate then the basic matrix B = 
and from Lemma 2 for a uniformly fast policy = 0 thus, 

EgiTl(n) = o{n°'), for all a > 0, for all j ^ b . 



If b is degenerate then it must be true that c“ = c° if we consider any matrix B = 
then Xq/ > 0 thus (c° — c^)Eg'T^{n) + (c° — c°‘)Eg'T^{n) = o(n“) and since c° = c“ we have that 
EgiTliji) = o(n°). Moreover from Eq. (ITBl) Eg'T^{ri) = o{n°‘), for all i ^ j, a, thus Eg'T^{n) = o(n“), 
for all j ^ a. 

Therefore, 

n — Eg>T°{n) = o(n“), for all a > 0. (18) 

It is also true that 



EgjT^in) = J2kPgj[T:{n) = k] 

k^l 


L/3nJ 

= ^fcP,,[r“(n) = fe]+ ^ kPgj[T:{n)=k] 

k=l fe=[/3„J + l 

< /3„P^ [T“ (n) < f3n] + nP^ [T“(n) > /?„] 

= n-{n- ^n)Pg' [r“(n) < /?„]. 


Therefore 

n - EgjT^in) > {n - f3r.)Pg, [r“(n) < /?„]. (19) 

From Eq. (IT^ and Eq. (ITOl) we obtain 


{n - /3n)Pg' [T^{n) < /3n] = o(n“), for all a > 0, 


thus 

Pg' [T“(n) < Pn] = o(n““^), for all a > 0. 
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We next consider the case b = {jo,a} with c^° < c° < c“. (The case c“ < c° < c^° is completely 
analogous). We have from Lemma 2 that for a uniformly fast policy > 0, thus 

Eg'Tiin) = o(n“), V a > 0, V j ^ 6 = {jo, a} (20) 

and 

(c° - ci’^)Egjn°{n) + (c° - ^E^jT^in) = o(n“), V a > 0. (21) 

If we sum Eq. (EUl) for all j ^ a, jo it follows that 

n - Eg>T^°{n) - Eg'T^(n) = e„, where e„ = o(n“), V a > 0. (22) 

Dividing Eq. (1211) with c“ — and using Eq. (l22l) , we obtain after some algebra the following two 
equalities 


~ ( 23 ) 

nx^ — EgiT{^{n) = o(n“), V a > 0. 

where = {jaZcJo — (ioZcJo probabilities which correspond to optimal solution b 

of linear program Eq. (jT]) under 9 . 

For any n let 


■pTT 

^ n 


Thus, it is obvious that 


^ nin), and E^ 


^ (c°-c^)r^(n). 


Furthermore, from Eq. (l2^ 


E-<r-(c0-ci). 


Now, we know that 


Eg,ri = o{n^), Va>0. 


nc° - C^n) = E: + (c° - c“)T“(n) + (c° - cZ«)Tf (n). 


and from nc° — CTr{n) >0, V n, we have that 


(24) 


(c“ - c°)r;(n) < F: + (c° - c^«)Ti“(n), 

therefore 
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c“ — c° 

C“ — C-^o 


T^in) < 


< 

FZ 

c“ — clo 

< 

f: 

c“ — c^° 

< 

f:: 

c“ — clo 

< 

f: 

c“ — cE 

< 

f:: 

c“ — clo 

< 

nx^ + - 


+ - — —n°{n) 

C« - CJO ^ ^ ' 

+ XaT^°{n) 

+ x'^T^'>{n) 

+ xL(T“(n)+Tf(n)) 
+ x'^{n-Tl) 

- x^Tl. 


f: 


Recall < r^(c° ~ c^), thus 


T:{n) 

T:in) 


^ , Tl{c° - cl) 

TlXfy^ H-^- 

“ " c" - C-?o 

< nx^+Tlp{jo,a) 




where pUo,a) = > 0. 

Finally, 

nx'a, - r“(n) + Tlp{jo,a) > 0. 

Thus, from Markov inequality, for any positive /3„ = o(n) 


Pg’ (nx'^ - T^{n) + ^IpUo, a) > nx'^ - / 3 „) 

^g' (”3:),-T“(n)+r”p(io,0!)) 

< ^- 

nx^—pn 

= I = o(n““i), V a > 0. 

nx^—Pn ^ ' 

Therefore 

P,'(T;“(n) < /?„) < Psj{T^{n) < + r>Oo,a)) = o(n“-i),V a > 0. 

Substituting T^{n) = n — V^ — T^°{n) into Eq. (ES|) we have 

T^°in) - nxj^ + (1 + p{jo,a))Tl > 0, 

then 

P,j {T^o in) < M = P,j {Z: < /3„ - nx'^ + (1 + p{jo, a))ri), 

where 

= Tf (n) - nx;„ + (1 + p{jo,a))Tl > 0, 

and 

Eg’ = o(n“), V a > 0 from Eq. (l2^ and Eq. (l24ll . 


(25) 
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Let, 

K" = {Zl < /3n - + (1 + p(jo, a))r^}, then 


Pe: (v:) = P,_, {v: n {r- < nS}) + P,_, (C n {r- > nS}) 
< P,^{V:n{ri<n6}) + P,>{Tl>nS) 


where 0 < i5 < and using Eq. (1^ we have that 


PAK>^5) < 


nS 

o(n“) 

nS 


= oirp ^), V a > 0. 


Let, 


(26) 


(27) 


Gl = {v:n{ri<nS}} 

= {ZZ <Pn- nx'j^ + (1 + p{jo, Oi))ri and < nS} 
c {ZZ </3n + [il +pijo,a))S - x'jjn}, 

= {ZZ <Pn- ^pn}, 

where 

/ / X ■ 

p = ^jo ~ (1 + POo,a))^ > - (l + p(jo,a)) ^_^^^^°.^ = 0. 

Now for any positive /3„ = o(n). 


and we have that 


3 no : Pn — rvp < 0, y n > no 


Pp{GZ) = 0,V n > noA), 

thus from Eq. (1^ ,Eq. (071) 

Ps'(0<oK-'), Va>0. 

Finally, 


□ 


Pg'{TZ°{n) < Pn) = o{n°' ^), V a > 0, for any positive /3„ = o(n). 


Lemma 3 If Pg' [T“(n) < /3„] = o(n“ ^), for all a > 0 and positive /3„ = o(n), then 


lim PsjTZin) < = 0, 


n—¥(yD = 


KM)' 


for all 0 G 0 and a € A(0). 
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Proof If we take , then P^j [T“(n) < = o(n“ and using a change of measure 

from 9 to 6_ and following the arguments in iBurnetas and Katehakid ( 1996lJl : iLai and Robbins 
(| 19851 1 we have that 

logn 


hm Pe[T^(n) < = 0. 

n^oo — P-a\M.) 


□ 

We recall for Theorem 2 that 


1 . 

Eemn) 1 

hmsup 7 < for all j G D{9), 

n—^oo log n Kj 

(28) 

2 . 

EeTiin) 

limsup -= 0 , for all j ^ D(9), 

n->-oo log n 

(29) 

3. 

nc° — EgCTrin) = o(logn). 

(30) 

From the definition of T“ (n) 

we can see that 


r“(5^(L„)) < T“(n) < T^iS^iK)) + M„, 

(31) 


where Ma is the majdmum number of times where population a appears in every block. 

We have derived T^{Ln) as: 

= ^lK = 6,&(|‘)^s(£)} + glK = 6,6(|‘)es(£)} 

t^2 t^2 

Ln. Lrt 

< E HK!) i Ki)} + E H’t? = b{i) e s(|)}. (32) 

t^2 t^2 

Finally, a policy tt is called feasible if 

<c°, Vn = l,2,.... (33) 

n 

Theorem 2 Under conditions (C1),(C2), and (C3), policy 7r° satisfies: 

Rj^o {6, n) 

limsup —^—=- < M(0), for all 0 S 0. 

n—fcso log Tl ~ ~ 

Proof We need to prove Eq. (l2^ . Eq. (l29l) and Eq. (1^ . From the Eq. (1^ . Eq. (l32ll and Lemmas 
4 and 5 we have proved the relations Eq. (E51) and Eq. (1^ . Equation Eq. (1501) follows from Eq. (1551) 
the feasibility of 7r° and block policies. 

□ 

Lemma 4 Under conditions (C1),(C2), policy 7r° satisfies: 


hm sup ^- 

n—)-oo log En 

Eef^^o 2(L„) 

lim sup ^--- 

n—>oo log Ln 


— 5777a ’ ^ E>{9), i G b,b ^ s{9) and 

= 0, for all i ^ G b,b G s(0). 


Proof We decompose T^o 2 (^ 11 ) follows: 
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= £ l{4 = bMt) e s{g),u,{tA^ = 

t^2 

Lrt 

= X] = b,h{g) e s{g),Ui{g,9^ = Ma*(|*),Ui(|‘,0i) > z*{g) - e} 

t^2 

+ X = b,b{^) e s(£),m,(£*, 0J = iia*(|*),w*(£*,0J < 2*(£) - e}- 

t^2 

From the relation between the two indices Ui and Ji we have that 

Ln 

X G s{g),u,{t,i.i) = Ma*(£*),M»(£*,0J > 2 *(£) - e} 

t^2 

- = ^Mt) G s(£),M*(£ ,£i) =Ma*(£‘), Ji(£ ,e) < lid 

L-n 

= X G s{g),Ui{g\0^ = Ma*(£*), 

Ln 

+ X = ^&(£*) G s{g),u^{g ,9^) = «„.(£*), 

t^2 

Mt e) < e) < Mi. e) - 6} 


TMsMt-^)y 

ts ,„s ,^t lOgLj; 


< Xad = b,big) e sig),uM , 0 ') = Wa.(£ ),r;„(5,o(i-1)) < 




Ji(£, e) - 5 

+ X^{d = b,b{^) e s(£),Mj(£ , 0j) =««•(£*) Vz(£\e) < Mi.^ - 

t^2 

Now, the first sum of the last inequality for c = ^ integer is equal to 

L-n. 

Xud = ^K£‘) G s(£),w.(£,0') = «„.(£*),r;o(5.o(t- 1)) < c} 

t^2 

< xiK = ^7^;o(^.o(t-i))<c} 

t^2 

Ln [c/m^j 

= X X =b,Tlo{S^o{t-l)) = siM + m^} 

t—2 s—0 
[c/mjj 

h 


} 


= X X^^d = ^r^o(5'^o(t-1)) = sTO^ + mi} 

s^O t^2 

< [c/m’l\ + 1 


< 


+ 1 = 


log Lr, 


MMii.y - 


+ 1 . 
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Thus, 


EgY. iK = bMg) e s{g),uMdd = - 1)) < } 


t=2 


Ji{0_,e) -5 


< 


log L„ 


e) - 

Furthermore, 


+ 1 . 


( 34 ) 


Ln 

'Y = b,b{g ) e s(g),Ui(g = Ua-(g ),Ji(g,£)< Ji(g,£) - l5} 

t^2 

Lfi 


Then from (C2) and Remark 3 we have that 

Ln 

EgY iK = b,b{g) G s{g),u,{^,e^ = Ua*(|‘), j*(|‘,e) < Mg,£) - >5} 

t^2 

<o(logL„). (35) 

Now we have that Ui(d ,9^ = Ua* {g) > Us{g ,9^) for any population s which is contained in an 

optimal BFS of 9. Now let b{9 ) = (r, s) and obviously b = {i,s), thus we can show the following 
inequalities 

in 

Y. = b,b{^) e s(|),Ui(£*,0j) = Ua-{^),Ui{^,9^) < z*(g) - ej 

t=2 

< Y Hu,{^, 9^) < z*{g) - £} 

t^2 

Ln 

<Yi{u s{^,§.s) ^ for some j < ST^o{t — 1)} 

t^2 

= Y j ~ 

t^2 

Thus 

Ln 

Eg'^l{7T^ = < 2 *(£) - e} 

t^2 

<o{\ogLn), (36) 

because 

Pi°{\gl -is\> for some j < t) 

t 

<YPiy\^^s-is\>0=o{l/t), 

i=i 
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since policy 7r° at any block t chooses 6(0 ) = (r, s) when T jr \t) > T{t — 1). 
Finally, it follows from Eq. (IMl) . Eq. (15^ and Eq. (155)) that 

EeT^o{Ln) < —j®—r — + 1 + o(logL„) + o(logL„). 

TO°(J*(£,e) - 6) 

Now from the definition of Ji{9,e) and (Cl) we have that 

lim Ji(0, e) = Ki{9), for i G D(0 and lim Ji(0, e) = oo, for i 4 -D(0). 
£->0 — — — £->0 — — 


Thus 


□ 


EeT^.o ,iLn) 

lim sup —- 

n^oc iOg Ln 

lim sup —- 

n—)-oo log 


— linin’ ^ E>{9), i G b,b ^ s{9) and 

= 0, for all i ^ D{9),i Gb,b G s(9). 


For the next Lemma, let 0 < e < {z*(9) — maxb^s(g) z^^=^}/2 and c a positive integer. Then we 


define for r = 0,1, 2,... 


Ar = 0 { max < s} and 




Br = Pi { 2 *’“*'= > z*(0) — e, for all 1 < i < t{1 — 1) and c’’ ^ — 1 < 

ba,es{g) 

where 0 < r < 1/|F| is the same as in the 7r°. 


Lemma 5 Under conditions (C2),(C3) 

(i) PfiAr) = 0{c-n, PfiBr) = 0{c-n. 

Moreover, if c > 1/(1 — \F\t) and r > rg then 

(ii) on Ar n Br, b(9 ) G s(0) for all ^ — 1 < 

(Hi) EeJ^o^.iLr.) = j:t 2 Pgib{g) i m) = o(logLn). 

Proof (i) We have that from (C2) 

Pg ( max \z^^^= ^ — z^E§)^ > e) = o{c~^), 1 < j < |E| 

= TC'—l<i-l<C''+l 


holds for the sample mean of the estimates 9 = - - 

Now let q be the smallest positive integer such that 
It = \P~^/P\ we define the sets 


thus it follows that Pg° {Ar) = o(c ’’). 

> c’’+^. For t = Q,...,q and 


= Pi > 2 *(£) _ for all 1 < i < /*} . 

hc6s(e) 
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Then by (C3), 


Pf{Qt) = = o(c ’■) for t = 0,...,q. 


(37) 


Now given that c’' ^ — 1 < and 1 < i < t{1 — 1), there exists t S {0, such that 

It+i >1 — 1> It >i and therefore for every fix ba we have that 




for every ba £ s{6) on the event no<t<(j Qt- Thus, because of Br D no<t<(j Qt and Eq. (l37l) we have 
that pP {Br) = o{c~'~). 


(ii) Let VP;^{1) 
sampling block. 

We note that 


X)b6s(e) T^o(0 be the number of times that 7r° samples from s{9) up to I 


max T{^o{1) > 

bes{g) 


K(£)(0 

m\ 


(38) 


where |s| denotes the number of elements of s. 

Consider that at any block I and ^ — 1 < we have that Ua*{k ) £ s(0), and Ua>(0 _) 

corresponds to an optimal BFS ba* {0 ). Then if b{9 ) £ s{9) we have the requested. Now, let assume 

that 6(0 ) ^ s(0), and we have that ba*{d .) £ s{9) which means that on ArCiBr the policy 7r° chooses 
from s{9). 

Then since \i) > t{1 — 1), 


) < max + s < z* (9) — e on Ar. 

b^s{g) 


In the case where To ^~\l) > t{1 — 1), we have on the event Ar 


z*{9)-e<z'^’^'^\ 


In the other case where T^o \l) < 't{1 — 1), we have on the event Br 

z*{g)-e< 

On the event Ar O Br, since employs from s(0) at block I and ^ — 1 < and since 

c > 1/(1 — |E|t) it follows that 


V^iil) > - 1 - - 2|^’|) > (|s(£)|)t(Z - 1) (39) 

for all ^ — 1 < and r > tq. 

From Eq. (l38l) and Eq. (l3^ , we obtain on Ar O Br 

max TK{1) > t{1 — 1) (40) 

bGs{g) 


for all c’’ ^ — 1 < if r > tq. 


24 




We note that for r > tq and c’' ^ < Z — 1 < on the event Ar D Br, 

max{z^ : T^o{l) > t{1 — 1) and b ^ s{6)} 

< max + e < z*(9) — e 

b^s(g) = 

< minjz*' : T^o{l) > t{1 — 1) and b G s{ff)} 


the last set is nonempty because of Eq. (IdHll . Hence 6(0 ) G s{d) for all c’’ ^ — 1 < on the 

event fl if r > ro- 

(ill) Let c > 1/(1 — I-EIt). Then it follows from (i) and (ii) that for r > tq and c’’“^ < t — 1 < 
ptiHg) i sm < pfiAr)+pfm = o{c-^) 

and therefore 

E pi\Kg)is{g)) = o{i). 

Hence, 

Ln 

Y^Pf {b{g) i S{g)) = oilogLr,). 

t=2 

□ 
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